An Algorithm Rapidly Segmenting Chinese Sentences into Individual Words

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmenting DNA sequence into `words'

[Abstract] This paper presents a novel method to segment/decode DNA sequences based on statistical language model. Firstly, we find the length of most DNA “words” is 12 to 15 bps by analyzing the genomes of 12 model species. Then we apply the unsupervised approach to build the DNA vocabulary and design DNA sequence segmentation method. We also find different genomes is likely to use the similar...

متن کامل

Segmenting unrestricted Chinese text into prosodic words instead of lexical words

This paper stresses the importance of converting a string of lexical words to that of prosodic words in TTS systems by presenting the surface differences and perceptual differences between them. A statistical rule based method and a CART based method are proposed as solutions. Though ComplicatedSet based CART method performs the best, the achievement is obtained at the cost of heavy computation...

متن کامل

Segmenting Chinese Unknown Words by Heuristic Method

Chinese text segmentation is important in Chinese text indexing. Due to the lack of word delimiters in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, the segmentation ambiguities and the occurrences of out-of-vocabulary words (i.e. unknown words) are the major challenges in Chinese segmentation. Many research works dealing with the problem of ...

متن کامل

Segmenting Sentences into Linky Strings Using D-bigram Statistics

It is obvious that segmentation takes an important role in natural language processing(NLP), especially for the languages whose sentences are not easily separated into morphemes. In this s tudy we propose a method of segmenting a sentence. The system described in this paper does not use any grammatical information or knowledge in processing. Instead, it uses statistical information drawn from n...

متن کامل

An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes

This paper describes an unsupervised algorithm for segmenting categorical time series into episodes. The Voting-Experts algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm successfully segments text into words in four language...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: MATEC Web of Conferences

سال: 2019

ISSN: 2261-236X

DOI: 10.1051/matecconf/201926704001